Automatic Selection of Noun Phrases as Document Descriptors in an FCA-Based Information Retrieval System

نویسندگان

  • Juan M. Cigarrán
  • Anselmo Peñas
  • Julio Gonzalo
  • M. Felisa Verdejo
چکیده

Automatic attribute selection is a critical step when using Formal Concept Analysis (FCA) in a free text document retrieval framework. Optimal attributes as document descriptors should produce smaller, clearer and more browsable concept lattices with better clustering features. In this paper we focus on the automatic selection of noun phrases as document descriptors to build an FCA-based IR framework. We present three different phrase selection strategies which are evaluated using the Lattice Distillation Factor and the Minimal Browsing Area evaluation measures. Noun phrases are shown to produce lattices with good clustering properties, with the advantage (over simple terms) of being better intensional descriptors from the user’s point of view.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Using Noun Phrase Heads to Extract Document Keyphrases

Automatically extracting keyphrases from documents is a task with many applications in information retrieval and natural language processing. Document retrieval can be biased towards documents containing relevant keyphrases; documents can be classified or categorized based on their keyphrases; automatic text summarization may extract sentences with high keyphrase scores. This paper describes a ...

متن کامل

Automatic hypertext information retrieval in a corporate memory using noun phrases in context

In this paper, we describe a method to generate an information retrieval hypertext structure on a large collection of homogeneous documents by generating links only between noun phrases that are pertinent for navigation. Noun phrases are selected by automatic extraction and filtered on the basis of the linguistic context class where they appear, also determined automatically.

متن کامل

Automatic titling of Articles Using Position and Statistical Information

This paper describes a system facilitating information retrieval in a set of textual documents by tackling the automatic titling and subtitling issue. Automatic titling here consists in extracting relevant noun phrases from texts as candidate titles. An original approach combining statistical criteria and noun phrases positions in the text helps collecting relevant titles and subtitles. So, the...

متن کامل

Recognising Complex Prepositions Prep+N+Prep as Negative Patterns in Automatic Term Extraction from Texts

This work is a study of the delimitation of complex prepositions (CP) as lexical units, items of a computational lexicon that includes compounds and phrases. In addition, given the utmost importance of spotting noun phrases (NP) in document retrieval systems, parsing prepositional structures such as “Prep1 N Prep2 X” prevents the fragment “N Prep2 X” from being detected as a noun phrase, i.e. t...

متن کامل

Noun phrases as building blocks for cross-language Search Assistance

This paper presents a Foreign-Language Search Assistant that uses noun phrases as fundamental units for document translation and query formulation, translation and refinement. The system (a) supports the foreign-language document selection task providing a cross-language indicative summary based on noun phrase translations, and (b) supports query formulation and refinement using the information...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2005